The genome sequence of the spotted Meliscaeva, Meliscaeva auricollis (Meigen, 1822)

We present a genome assembly from an individual female Meliscaeva auricollis (the spotted Meliscaeva; Arthropoda; Insecta; Diptera; Syrphidae). The genome sequence is 385.1 megabases in span. Most of the assembly is scaffolded into 5 chromosomal pseudomolecules, including the X sex chromosome. The mitochondrial genome has also been assembled and is 17.52 kilobases in length.


Background
Meliscaeva auricollis (Meigen, 1822) also referred to as the spotted Meliscaeva, is a small black and yellow hoverfly species with highly variable abdominal markings (Ball & Morris, 2015;Stubbs & Falk, 2002).Individuals are most frequent in southern and middle England with records becoming increasingly sparse northwards into Scotland and Ireland (Ball & Morris, 2000;Stubbs & Falk, 2002).The oval-shaped, translucent larvae of this species are aphid predators and can be found among aphid infested shrubs including barberries and broom, and also on the flowers and stems of white umbellifers (Ball & Morris, 2000;Rotheray, 1993;Stubbs & Falk, 2002).M. auricollis has a wide and sometimes unpredictable annual flight window and can be observed from February to December, with numbers usually at a peak in July to August (Stubbs & Falk, 2002).Notably during certain years barely any individuals of the species are documented, the reason behind this remains unclear (Ball & Morris, 2015).Adult M. auricollis hoverflies can be discovered associated with trees at woodland edges, hedgerows and mature gardens (Ball & Morris, 2000).
The variable abdominal markings apparent in this species are dependent upon larval developmental temperatures.Larvae developing in colder winter months are distinctively darker than summer individuals and exhibit abdominal triangular yellow spots.In contrast during warmer months individuals are lighter overall and display larger yellow abdominal bands (Ball & Morris, 2015;Stubbs & Falk, 2002).M. auricollis can be distinguished from M. cinctella, the other British member of the genus, through the elliptical shape of markings present on abdominal segment T 2 , which are broad and blunt ended in M. cinctella (Ball & Morris, 2015).Adults can be found interacting with a large variety of flowers as well as basking upon sunlit leaves.Males have been spotted hovering above forest pathways and sunny tree branches (Ball & Morris, 2000).
The chromosomally complete genome sequence for Meliscaeva auricollis as part of the collaborative Darwin Tree of Life Project offers an opportunity to investigate and enhance our knowledge of this understudied hoverfly species.

Genome sequence report
The genome was sequenced from one female Meliscaeva auricollis (Figure 1) collected from Wytham Woods, Oxfordshire, UK (51.79,.A total of 35-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 56 missing joins or mis-joins and removed 4 haplotypic duplications, reducing the scaffold number by 48.81%. The final assembly has a total length of 385.1 Mb in 42 sequence scaffolds with a scaffold N50 of 96.5 Mb (Table 1).The snailplot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.98%) of the assembly sequence was assigned to 5 chromosomal-level scaffolds, representing 4 autosomes and the X sex chromosome.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The order and orientation of contigs is unsure over repetitive region ~27,691 to 29,290 Mb on Chromosome 1.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition and nucleic acid extraction
Two female Meliscaeva auricollis were netted in from Wytham Woods, Oxfordshire (biological vice-county Berkshire), UK (latitude 51.79, longitude -1.32) on 2021-04-19.The   specimens were collected and identified by Steven Falk (independent researcher) and preserved on dry ice.The specimen used for genome sequencing had ID Ox001271 (ToLID idMelAuri2), while the specimen used for Hi-C and RNA sequencing had ID Ox001258 (ToLID idMelAuri1).
The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.In sample preparation, the idMelAuri2 sample was weighed and dissected on dry ice (Jay et al., 2023).Tissue from the thorax was homogenised using a PowerMasher II tissue disruptor (Denton et al., 2023a).HMW DNA was extracted using the Automated MagAttract v1 protocol (Sheerin et al., 2023).DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30 (Todorovic et al., 2023).Sheared DNA was purified by solid-phase reversible immobilisation (Strickland et al., 2023): in brief, the method employs a 1.8X ratio of AMPure PB beads to sample to eliminate shorter fragments and concentrate the DNA.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA  were also generated from head tissue of idMelAuri1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.

Genome assembly, curation and evaluation
Assembly was carried out with Hifiasm (Cheng et al., 2021) and haplotypic duplication was identified and removed with purge_dups (Guan et al., 2020).The assembly was then scaffolded with Hi-C data (Rao et al., 2014) using YaHS (Zhou et al., 2023).The assembly was checked for contamination and corrected as described previously (Howe et al., 2021).Manual curation was performed using HiGlass (Kerpedjiev et al., 2018) and Pretext (Harry, 2022).The mitochondrial genome was assembled using MitoHiFi (Uliano-Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) or MITOS (Bernt et al., 2013) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Luis Diambra
Univeridad Nacional de La Plata, La Plata, Argentina This manuscript from Falk and Woodcock reports the genome of Meliscaeva auricollis.The raw genomic data, extracted from an individual female, was generated by Pacific Biosciences singlemolecule HiFi.These reads were assembled using Hifiasm (35-fold coverage) and scaffolded to Hi-C data using YaHS.The assembly was scaffolded into 5 chromosomal pseudomolecules.Manual curation was performed using HiGlass.The authors provide the assembly statistics obtained by using BlobToolKit and BUSCO.The authors do not perform the genome annotation, despite the RNA sequencing.All these analyses are clearly explained in the protocols of the Tree of Life, which contain enough information to allow replication by other researchers.However, the authors do not mention the software/pipeline for the decontamination step (endosymbionts, etc).They only mention the Howe 2021 paper.Could the authors give more details about that?
Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Bioinformatics
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Background
It might be adequate to add in the first paragraph more information regarding to distribution range, conservation status and its role in ecosystems in a brief way.I would add in one sentence (before the sentence about its higher frequency in England), the distribution range of this species and its conservation status.You can see, for instance in the IUCN RedList: https://www.iucnredlist.org/species/149167300/149167303 At the end of the first paragraph you could add one more specific sentence about its role as pollinator, predator of aphids, etcetera as you consider.You could add information from the Global Biotic Interactions (GloBI) database [1], for instance: https://www.globalbioticinteractions.org/?interactionType=interactsWith&targetTaxon=Meliscaeva%20auricollis

Genome sequence report
If well the percentage of uncalled nucleotides (i.e., "N") is very low (0.0%; Figure 2) perhaps it could be included somewhere the number of them.I think that 32,837 uncalled nucleotides.Add if you agree, it does not matter really.

Methods
To increase reproducibility, I strongly recommend creating a GitHub page or a Supplementary file text containing all the commands with the selected parameters/options used across the bioinformatic pipeline.This will be very useful for the whole scientific community.

Figure 2 .
Figure 2. Genome assembly of Meliscaeva auricollis, idMelAuri2.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 385,165,895 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (119,641,053 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (96,515,288 and 54,712,868 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the diptera_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CANUFD01/dataset/CANUFD01/snail.

Figure 3 .
Figure 3. Genome assembly of Meliscaeva auricollis, idMelAuri2.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CANUFD01/dataset/CANUFD01/blob.

Figure 4 .
Figure 4. Genome assembly of Meliscaeva auricollis, idMelAuri2.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/CANUFD01/dataset/CANUFD01/cumulative.

Figure 5 .
Figure 5. Genome assembly of Meliscaeva auricollis, idMelAuri2.1:Hi-C contact map of the idMelAuri2.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=I1H32XwISJG8A3Z0ravEmA.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.